2 research outputs found
A Biomimetic Approach Based on Immune Systems for Classification of Unstructured Data
In this paper we present the results of unstructured data clustering in this
case a textual data from Reuters 21578 corpus with a new biomimetic approach
using immune system. Before experimenting our immune system, we digitalized
textual data by the n-grams approach. The novelty lies on hybridization of
n-grams and immune systems for clustering. The experimental results show that
the recommended ideas are promising and prove that this method can solve the
text clustering problem.Comment: 10 pages, 4 figure
Visualization and clustering by 3D cellular automata: Application to unstructured data
Given the limited performance of 2D cellular automata in terms of space when
the number of documents increases and in terms of visualization clusters, our
motivation was to experiment these cellular automata by increasing the size to
view the impact of size on quality of results. The representation of textual
data was carried out by a vector model whose components are derived from the
overall balancing of the used corpus, Term Frequency Inverse Document Frequency
(TF-IDF). The WorldNet thesaurus has been used to address the problem of the
lemmatization of the words because the representation used in this study is
that of the bags of words. Another independent method of the language was used
to represent textual records is that of the n-grams. Several measures of
similarity have been tested. To validate the classification we have used two
measures of assessment based on the recall and precision (f-measure and
entropy). The results are promising and confirm the idea to increase the
dimension to the problem of the spatiality of the classes. The results obtained
in terms of purity class (i.e. the minimum value of entropy) shows that the
number of documents over longer believes the results are better for 3D cellular
automata, which was not obvious to the 2D dimension. In terms of spatial
navigation, cellular automata provide very good 3D performance visualization
than 2D cellular automata.Comment: 10 pages, 8 figure